Document supported databricks_retry_args usage for deferrable Databricks operators#68017
Document supported databricks_retry_args usage for deferrable Databricks operators#68017kosiew wants to merge 2 commits into
databricks_retry_args usage for deferrable Databricks operators#68017Conversation
…ferrable operators Add "Retry args in deferrable mode" subsection under DatabricksSubmitRunDeferrableOperator and DatabricksRunNowDeferrableOperator explaining: - Serialization requirement: only plain Python primitives allowed across the trigger boundary - Supported shapes (int/float primitives, nested plain-dict) - Unsupported shapes (Tenacity objects, callables) with note that a ValueError is raised at task submission - Recommended workaround: use non-deferrable mode for custom retry strategies Also update changelog for 7.16.0.
8b7caa8 to
283df3e
Compare
|
cc @moomindani for review |
moomindani
left a comment
There was a problem hiding this comment.
Thanks for the follow-up — this is exactly the docs fix scoped at the end of the #64960 thread, and the structure of the two new sections is clean (heading levels, anchors, and code blocks all check out). However, two of the content claims do not match what the merged code actually does, and two files should not be in the diff:
- The "Supported (serialization-safe and runtime-valid)" example
{"reraise": True}is not runtime-valid:BaseDatabricksHookreplaces (does not merge) the defaultretry_args, sostop/waitare lost and tenacity falls back tostop_never+wait_none()— an infinite, zero-backoff retry loop. Settingdatabricks_retry_argsat all also makesretry_limit/retry_delayno-ops. Since tenacity strategy objects (the only way to setstop/waitinsideretry_args) are exactly what deferrable mode rejects, the honest guidance is "do not usedatabricks_retry_argsin deferrable mode; useretry_limit/retry_delay", rather than a supported example. - "will raise ValueError at task submission" overstates the timing: validation runs in the trigger constructors at defer time, after
submit_run/run_nowhas already launched the Databricks run. (The #64960 thread did say "before any Databricks API call", but re-checking the merged code, the operator path submits first and defers second —operators/databricks.py:834-836.) generated/provider_dependencies.json.sha256sumis a stale-base rebase artifact — please rebase onto currentmainand drop it.- The hand-added changelog entry should be removed (release-manager-maintained file; see inline).
Details inline.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
| @@ -1 +1 @@ | |||
| 2d6f34bb40832f84cb6c121237b1c5b0a05181dccface9fd171558f4df1747dc | |||
| 2ccde55d75b93c7fc2c5723fc7f74bf8995244606190c98acf005ea1f39f04ca | |||
There was a problem hiding this comment.
This file was intentionally deleted and gitignored on main by #68801 (it is auto-regenerated by breeze when needed), so this PR re-introduces a file that no longer exists upstream — and the hash edit itself is byte-for-byte identical to the already-merged #68011. It looks like the branch was cut in the window between #67080 (which mistakenly re-added the file) and #68011. Could you rebase onto current main and drop this file from the diff? This is the same stale-base issue flagged during the #64960 review round.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
|
|
||
| When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the | ||
| trigger boundary and must contain only Airflow-serializable values (plain Python primitives | ||
| such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``). |
There was a problem hiding this comment.
Two precision issues here:
- The boundary is Airflow serde, not plain primitives. Databricks: Fail fast for non-serializable retry_args in deferrable operators and triggers #64960 deliberately switched the validator from
json.dumpsback toairflow.sdk.serde.serializeprecisely because serde accepts more than primitives — the merged tests cover adatetimevalue as supported. Documenting "plain Python primitives" re-narrows what Databricks: Fail fast for non-serializable retry_args in deferrable operators and triggers #64960 resolved to keep broad; "Airflow-serde-serializable values" (with primitives as the common case) would be accurate. - Serializability is necessary but not sufficient: the validator checks values only, not whether the keys are valid
tenacity.Retryingkwargs. For example{"stop": 3}passes validation, then fails at runtime withTypeError: 'int' object is not callablewhen tenacity invokesstop(retry_state). Worth stating that passing validation does not mean the retry config is meaningful.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
| .. code-block:: python | ||
|
|
||
| # Only plain-primitive Retrying kwarg: reraise | ||
| databricks_retry_args = {"reraise": True} |
There was a problem hiding this comment.
This example is serialization-safe but not runtime-valid as documented. When retry_args is provided, BaseDatabricksHook.__init__ replaces the defaults rather than merging (hooks/databricks_base.py:151-161 — only retry and after are re-injected), so stop and wait are gone and tenacity falls back to stop_never + wait_none(). On a persistently retryable API error this retries forever with zero backoff. It also contradicts the paragraph below: once databricks_retry_args is set, retry_limit/retry_delay are ignored entirely (they are only consulted in the else branch), so the two cannot be combined. Given deferrable mode rejects the tenacity objects that would restore stop/wait, I would replace the "Supported" example with explicit guidance: in deferrable mode, leave databricks_retry_args unset and use retry_limit/retry_delay.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
| ``wait``, ``retry``, ``before``, ``after``, etc.) require tenacity | ||
| callable objects, which are not serialization-safe in deferrable mode. | ||
|
|
||
| **Not supported** in deferrable mode (will raise ``ValueError`` at task submission): |
There was a problem hiding this comment.
"At task submission" overstates the timing. The validation added by #64960 lives only in the trigger constructors (triggers/databricks.py:61,160), and the operator defers after submitting: submit_run/run_now runs first, then _handle_deferrable_databricks_operator_execution builds the trigger (operators/databricks.py:834-836, 1183-1185). So with invalid retry_args the Databricks run is already launched, and the task fails at defer time, leaving that run in flight. (And if the run is already terminal on the first poll, the trigger is never built and no error is raised at all.) Suggest: "will raise ValueError when the task defers — note the Databricks run has already been submitted at that point".
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
| databricks_retry_args = {"retry": my_custom_retry_callable} | ||
|
|
||
| If you need a custom callable retry strategy, use the non-deferrable | ||
| :class:`~airflow.providers.databricks.operators.DatabricksRunNowOperator` (``deferrable=False``). |
There was a problem hiding this comment.
The cross-reference path is missing the module segment: the class lives at airflow.providers.databricks.operators.databricks.DatabricksRunNowOperator, and operators/__init__.py has no re-exports, so this xref will not resolve. Sibling pages (sql_statements.rst, notebook.rst, task.rst) use the full path. The same short form already exists in the pre-existing intro lines of these two pages — since this PR touches both files, it would be good to fix those occurrences too.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
| Retry args in deferrable mode | ||
| ----------------------------- | ||
|
|
||
| When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the |
There was a problem hiding this comment.
Same comments as the equivalent section in run_now.rst apply to this copy: the {"reraise": True} example, the "plain primitives" framing, the "at task submission" timing, and the :class: path.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
| ~~~~~~~~ | ||
|
|
||
| * ``Fail fast for non-serializable retry_args in deferrable operators and triggers (#64960)`` | ||
| * ``Document supported retry_args shapes for deferrable Databricks operators`` |
There was a problem hiding this comment.
Please drop this entry. Per the NOTE TO CONTRIBUTORS at the top of this file, the changelog is maintained semi-automatically by the release manager, and contributor edits are only for breaking-change guidance. A hand-added line (without the (#NNNNN) suffix, and under "Features" for a doc-only change) will conflict with the generated entries at release time — the commit message alone is sufficient.
Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting
|
Small correction to one point in my review: Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting |
What does this PR do?
Follow up for #64960
Adds documentation clarifying which
databricks_retry_argsconfigurations are supported when using Databricks operators in deferrable mode.The new documentation:
databricks_retry_argsmust be serialization-safe because it is serialized across the trigger boundary whendeferrable=True.{"reraise": True}.retry_limitandretry_delayoperator parameters.stop_after_attempt,wait_incrementing, etc.) and arbitrary callables.Does this PR introduce any user-facing change?
Yes. Documentation now explicitly describes the serialization requirements and supported shapes for
databricks_retry_argsin deferrable Databricks operators, helping users avoid unsupported retry configurations.How was this patch tested?
No tests were added or modified. This PR contains documentation and changelog updates only.
Was generative AI tooling used to co-author this PR?
ChatGPT